用于预测HIV-1整合酶链切割(3’processing)抑制剂的支持向量机模型
Support vector machine (SVM) models for predicting inhibitors of the 3’ processing step of HIV-1 integrase
Xuan, S.Y.; Wang, M.L.; Kang H.; Kirchmair, J.; Tan, L.; Yan, A.X.*
Molecular Informatics, 2013, 32(9-10), 811-826.
抑制HIV-1整合酶的链切割过程(3'P)是艾滋病治疗中最有前途的策略之一。 使用支持向量机(SVM) 算法,我们构建了6个分类模型来预测3'P抑制剂生物活性。这些模型基于1253个抑制剂分子数据集和经过筛选的48个分子描述符构建, 实验报道的IC50活性值范围从纳摩尔级到微摩尔级。SVM模型Model B2表现最好,其对测试集的预测精度、敏感性、特异性和Matthews相关系数(MCC) 分别为93%、81%、94%和0.67。 氢键形成能力和亲水性的存在通常是影响抑制剂生物活性的关键因素。其他重要因素包括分子折射性、π原子电荷、总电荷、孤对电负性和有效原子极化性。 通过对高活性抑制剂和弱活性抑制剂的结构比较分析证实了以上观察结果,并揭示了3'P抑制剂的几个特征结构元素。
Inhibition of the 3’ processing step of HIV-1 integrase by small molecule inhibitors is one of the most promising strategies for the treatment of AIDS. Using a support vector machine (SVM) approach, we developed six classification models for predicting 3’P inhibitors. The models are based on up to 48 selected molecular descriptors and a comprehensive data set of 1253 molecules, with measured activities ranging from nanomolar to micromolar IC50 values. Model B2, the most robust SVM model, obtains a prediction accuracy, sensitivity, specificity and Matthews correlation coefficient (MCC) of 93 %, 81 %, 94 % and 0.67 on the test set, respectively. The presence of hydrogen bonding features and hydrophilicity in general were identified as key determinants of inhibitory activity. Further important properties include molecular refractivity, π atom charge, total charge, lone pair electronegativity, and effective atom polarizability. Comparative fragment-based analysis of the active and inactive molecules corroborated these observations and revealed several characteristic structural elements of 3’P inhibitors. The models built in this study can be obtained from the authors.
Classification Models performance: Dataset (1253 3’P inhibitors of HIV-1 Integrase)
Model Name | Algorithm | Descriptors | Spliting methods | Training set numbers | Training set accuracy (%) | Training set Cross-validation 5-fold accuracy (%) | Training set Cross-validation 10-fold accuracy (%) | Training set Cross-validation LOO accuracy (%) | Test set numbers | Test set SE | Test set SP | Test set accuracy (%) | Test set MCC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model A1 | SVM | 41 MOE | Random | 493 | 95.94 | 82.76 | 82.56 | 83.57 | 760 | 69.33 | 88.91 | 86.97 | 0.4641 |
Model A2 | SVM | 41 MOE | Kohonen’s self-organizing map (SOM) | 537 | 92.92 | 79.70 | 79.33 | 79.70 | 716 | 68.83 | 93.58 | 90.92 | 0.5726 |
Model B1 | SVM | 41 MOE + 7 RDF | Random | 493 | 99.39 | 84.38 | 83.37 | 85.40 | 760 | 69.33 | 90.07 | 88.03 | 0.4859 |
Model B2 | SVM | 41 MOE + 7 RDF | Kohonen’s self-organizing map (SOM) | 537 | 98.32 | 79.70 | 79.89 | 81.56 | 716 | 80.52 | 94.21 | 92.74 | 0.6707 |
Model C1 | SVM | MACCS | Random | 493 | 96.35 | 81.74 | 83.77 | 84.18 | 760 | 38.51 | 96.32 | 86.05 | 0.4465 |
Model C2 | SVM | MACCS | Kohonen’s self-organizing map (SOM) | 537 | 92.92 | 81.01 | 81.38 | 80.45 | 716 | 51.92 | 96.24 | 89.80 | 0.5478 |
主要项目成员
博士研究生
博士研究生